Improving the Efficiency of Frequent Pattern Mining by Compact Data Structure Design

نویسندگان

Raj P. Gopalan

Yudho Giri Sucahyo

چکیده

Mining frequent patterns has been a topic of active research because it is computationally the most expensive step in association rule discovery. In this paper, we discuss the use of compact data structure design for improving the efficiency of frequent pattern mining. It is based on our work in developing efficient algorithms that outperform the best available frequent pattern algorithms on a number of typical data sets. We discuss improvements to the data structure design that has resulted in faster frequent pattern discovery. The performance of our algorithms is studied by comparing their running times on typical test data sets against the fastest Apriori, Eclat, FP-Growth and OpportuneProject algorithms. We discuss the performance results as well as the strengths and limitations of our algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining maximal frequent itemsets from data streams

Frequent pattern mining from data streams is an active research topic in data mining. Existing research efforts often rely on a two-phase framework to discover frequent patterns: (1) using internal data structures to store meta-patterns obtained by scanning the stream data; and (2) re-mining the meta-patterns to finalize and output frequent patterns. The defectiveness of such a two-phase framew...

متن کامل

On Computing Condensed Frequent Pattern Bases

Frequent pattern mining has been studied extensively. However, the effectiveness and efficiency of this mining is often limited, since the number of frequent patterns generated is often too large. In many applications it is sufficient to generate and examine only frequent patterns with support frequency in close-enough approximation instead of in full precision. Such a compact but close-enough ...

متن کامل

A Compact FP-Tree for Fast Frequent Pattern Retrieval

Frequent patterns are useful in many data mining problems including query suggestion. Frequent patterns can be mined through frequent pattern tree (FPtree) data structure which is used to store the compact (or compressed) representation of a transaction database (Han, et al, 2000). In this paper, we propose an algorithm to compress frequent pattern set into a smaller one, and store the set in a...

متن کامل

Indexed Bit Map (IBM) for Mining Frequent Sequences

Sequential pattern mining has been an emerging problem in data mining. In this paper, we propose a new algorithm for mining frequent sequences. It processes only one scan of the database thanks to an indexed structure associated to a bit map representation. Thus, it allows a fast data access and a compact storage in main memory. This algorithm has been applied to activity sequences belonging to...

متن کامل

COFI-tree Mining: A New Approach to Pattern Growth with Reduced Candidacy Generation

Existing association rule mining algorithms suffer from many problems when mining massive transactional datasets. Some of these major problems are: (1) the repetitive I/O disk scans, (2) the huge computation involved during the candidacy generation, and (3) the high memory dependency. This paper presents the implementation of our frequent itemset mining algorithm, COFI, which achieves its effic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Improving the Efficiency of Frequent Pattern Mining by Compact Data Structure Design

نویسندگان

چکیده

منابع مشابه

Mining maximal frequent itemsets from data streams

On Computing Condensed Frequent Pattern Bases

A Compact FP-Tree for Fast Frequent Pattern Retrieval

Indexed Bit Map (IBM) for Mining Frequent Sequences

COFI-tree Mining: A New Approach to Pattern Growth with Reduced Candidacy Generation

عنوان ژورنال:

اشتراک گذاری